Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting (2305.19957v2)

Published 31 May 2023 in cs.CV

Abstract: End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. Besides, they overlook the exploring on multilingual text spotting which requires an extra script identification task. In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese. On the other hand, our DeepSolo++ achieves better performance on the additionally introduced script identification task with a simpler training pipeline compared with previous methods. In addition, our models are also compatible with line annotations, which require much less annotation cost than polygons. The code is available at \url{https://github.com/ViTAE-Transformer/DeepSolo}.

Overview of DeepSolo++: An Efficient Solution for Multilingual Text Spotting

The paper "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting" introduces a novel approach aimed at streamlining and enhancing the process of multilingual text spotting. The paper tackles the complex task of integrating text detection, recognition, and script identification into a unified framework, leveraging a simplified model architecture that draws inspiration from the DETR paradigm. DeepSolo++, the proposed model, focuses on achieving high performance in multilingual environments with a single, straightforward Transformer-based architecture.

Key Contributions

  1. Explicit Point Query Design: The authors introduce an explicit point query representation derived from Bezier center curves. This novel query form is used to encode position, shape, and semantics of text instances in a concise manner, which facilitates the integration of detection, recognition, and script identification tasks through a single decoder framework.
  2. Simplification of the Text Spotting Pipeline: DeepSolo++ eliminates the need for heuristic post-processing steps typical in previous architectures, like RoI-based feature extraction and complex language prediction networks. This reduction in architectural complexity results in improved training efficiency and robustness, particularly in scenarios with weak annotations.
  3. Comprehensive Multilingual Capability: The model demonstrates strong extensibility in handling various text scripts through a multilingual routing mechanism, leveraging a simple script token to facilitate script identification and appropriate routing for character classification. The paper validates the model's effectiveness on multiple challenging datasets, particularly highlighting its ability to handle diverse character classes and complex script structures like Chinese.
  4. Strong Performance Metrics: The experimental results are impressive, with DeepSolo++ achieving state-of-the-art results across various monolingual and multilingual benchmarks. Notably, on the ICDAR 2019 MLT dataset, the model demonstrates significant performance improvements in joint detection and script identification tasks (5.5% H-mean and 8.0% AP improvements), and robust recognition capabilities in end-to-end text spotting scenarios (2.7% H-mean gain).

Practical and Theoretical Implications

The practical implications of DeepSolo++ are profound, as the model provides a flexible, cost-effective solution for real-world applications requiring effective multilingual text recognition and identification. The model's efficiency and simplicity potentially lower the barrier for deploying sophisticated text spotting systems in various applications, such as intelligent navigation and multilingual information retrieval systems.

From a theoretical perspective, this research contributes to the understanding of leveraging Transformer architectures for complex object detection tasks by showcasing the efficacy of point-based representations in simplifying and enhancing performance. The insights gained from this work could guide future developments in Transformer models applied to other domains requiring integrated detection and recognition tasks.

Speculative Future Directions

Future developments inspired by this research could involve further exploration of the synergy between text representation, detection, and recognition architectures. Possible areas of focus include refining the Transformer architecture for better long-tail recognition, integrating more powerful LLMs to enhance recognition accuracy, and tailoring the encoder-decoder mechanism for dynamic adaptation to varying text scripts and structures.

Additionally, investigating solutions to the current challenges in inverse-like text detection and recognition could result in a more robust text spotting framework capable of handling even more diverse real-world scenarios. The paper provides a foundational step towards realizing a comprehensive, efficient, and versatile text spotting solution fit for a multitude of languages and scripts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Long, S., He, X., Yao, C.: Scene text detection and recognition: The deep learning era. International Journal of Computer Vision 129(1), 161–184 (2021) DeSouza and Kak [2002] DeSouza, G.N., Kak, A.C.: Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 237–267 (2002) Liu et al. [2020] Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020) Feng et al. [2021] Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) DeSouza, G.N., Kak, A.C.: Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 237–267 (2002) Liu et al. [2020] Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020) Feng et al. [2021] Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020) Feng et al. [2021] Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  2. DeSouza, G.N., Kak, A.C.: Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 237–267 (2002) Liu et al. [2020] Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020) Feng et al. [2021] Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020) Feng et al. [2021] Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  3. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020) Feng et al. [2021] Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  4. Feng, W., Yin, F., Zhang, X.-Y., He, W., Liu, C.-L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. International Journal of Computer Vision 129, 619–637 (2021) Wang et al. [2021] Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  5. Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5349–5367 (2021) Liao et al. [2020] Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  6. Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020) Ronen et al. [2022] Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  7. Ronen, R., Tsiper, S., Anschel, O., Lavi, I., Markovitz, A., Manmatha, R.: Glass: global to local attention for scene-text spotting. In: ECCV, pp. 249–266 (2022) Zhong et al. [2021] Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  8. Zhong, H., Tang, J., Wang, W., Yang, Z., Yao, C., Lu, T.: Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv preprint arXiv:2110.10405 (2021) Huang et al. [2022] Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  9. Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022) Xing et al. [2019] Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  10. Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9126–9136 (2019) Wang et al. [2021] Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  11. Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., Shi, G.: Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In: AAAI, vol. 35, pp. 2782–2790 (2021) Bušta et al. [2018] Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  12. Bušta, M., Patel, Y., Matas, J.: E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In: ACCV, pp. 127–143 (2018) Baek et al. [2020] Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  13. Baek, Y., Shin, S., Baek, J., Park, S., Lee, J., Nam, D., Lee, H.: Character region attention for text spotting. In: ECCV, pp. 504–521 (2020) Huang et al. [2021] Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  14. Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., Hassner, T.: A multiplexed network for end-to-end, multilingual ocr. In: CVPR, pp. 4547–4557 (2021) Huang et al. [2023] Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  15. Huang, J., Liang, K.J., Kovvuri, R., Hassner, T.: Task grouping for multilingual text recognition. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 297–313 (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  16. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NeurIPS, vol. 30 (2017) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  17. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  18. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Xu et al. [2021] Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  19. Xu, Y., Zhang, Q., Zhang, J., Tao, D.: Vitae: Vision transformer advanced by exploring intrinsic inductive bias. In: NeurIPS, vol. 34, pp. 28522–28535 (2021) Zhang et al. [2022] Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  20. Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022) Kittenplon et al. [2022] Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  21. Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR, pp. 4604–4613 (2022) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  22. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  23. Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: Dptext-detr: Towards better scene text detection with dynamic points in transformer. In: AAAI (2023) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  24. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Bookstein [1989] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  25. Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) Lyu et al. [2018] Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  26. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 67–83 (2018) Liao et al. [2021] Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  27. Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(2), 532–548 (2021) Liu et al. [2021] Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  28. Liu, Y., Shen, C., Jin, L., He, T., Chen, P., Liu, C., Chen, H.: Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8048–8064 (2021) Zhu et al. [2021] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  29. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2021) Peng et al. [2022] Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  30. Peng, D., Wang, X., Liu, Y., Zhang, J., Huang, M., Lai, S., Li, J., Zhu, S., Lin, D., Shen, C., et al.: Spts: Single-point text spotting. In: ACM MM, pp. 4272–4281 (2022) Ye et al. [2023] Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  31. Ye, M., Zhang, J., Zhao, S., Liu, J., Liu, T., Du, B., Tao, D.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: CVPR, pp. 19348–19357 (2023) Zhang et al. [2019] Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  32. Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: ICDAR, pp. 1577–1581 (2019). IEEE Nayef et al. [2017] Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  33. Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: ICDAR, vol. 1, pp. 1454–1459 (2017) Nayef et al. [2019] Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  34. Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.-C., Liu, C.-l., et al.: Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: ICDAR, pp. 1582–1587 (2019). IEEE Lorentz [2013] Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  35. Lorentz, G.G.: Bernstein Polynomials. American Mathematical Soc., ??? (2013) Liu et al. [2022] Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  36. Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. In: ICLR (2022) Du et al. [2022] Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  37. Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision 130(8), 1961–1977 (2022) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  38. Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Graves et al. [2006] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  39. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Lin et al. [2017] Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  40. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017) Feng et al. [2022] Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  41. Feng, Z., Guo, S., Tan, X., Xu, K., Wang, M., Ma, L.: Rethinking efficient lane detection via curve modeling. In: CVPR, pp. 17062–17070 (2022) Ch’ng et al. [2020] Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  42. Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020) Karatzas et al. [2015] Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  43. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Liu et al. [2019] Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  44. Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019) Tang et al. [2019] Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  45. Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96, 106954 (2019) Karatzas et al. [2013] Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  46. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Singh et al. [2021] Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  47. Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: Textocr: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR, pp. 8802–8812 (2021) Fang et al. [2022] Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  48. Fang, S., Mao, Z., Xie, H., Wang, Y., Yan, C., Zhang, Y.: Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (2022) Chng et al. [2019] Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  49. Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, pp. 1571–1576 (2019). IEEE Sun et al. [2019] Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  50. Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, pp. 1557–1562 (2019). IEEE Loshchilov and Hutter [2019] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  51. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019) Wang et al. [2022] Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  52. Wang, W., Zhang, J., Cao, Y., Shen, Y., Tao, D.: Towards data-efficient detection transformers. In: ECCV, pp. 88–105 (2022). Springer Jia et al. [2023] Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  53. Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: CVPR, pp. 19702–19712 (2023) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  54. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Schaeffer et al. [2023] Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  55. Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Shi et al. [2017] Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  56. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017) Deng et al. [2018] Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  57. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018) Ma et al. [2021] Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  58. Ma, C., Sun, L., Zhong, Z., Huo, Q.: Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111, 107684 (2021) Qin et al. [2021] Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  59. Qin, X., Zhou, Y., Guo, Y., Wu, D., Tian, Z., Jiang, N., Wang, H., Wang, W.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM, pp. 414–423 (2021) Baek et al. [2019] Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  60. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019) Wang et al. [2020] Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  61. Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11753–11762 (2020) Zhang et al. [2020] Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  62. Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9699–9708 (2020) Ye et al. [2020] Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  63. Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: Scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020) Zhang et al. [2022] Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  64. Zhang, S.-X., Zhu, X., Yang, C., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022) Zhu et al. [2021] Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  65. Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021) Liao et al. [2022] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  66. Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) Wang et al. [2022] Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  67. Wang, W., Zhou, Y., Lv, J., Wu, D., Zhao, G., Jiang, N., Wang, W.: Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation. In: ACM MM, pp. 5014–5025 (2022) Tang et al. [2022] Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  68. Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., Bai, X.: Few could be better than all: Feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022) Zhang et al. [2023] Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  69. Zhang, Q., Xu, Y., Zhang, J., Tao, D.: Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 1–22 (2023) Feng et al. [2019] Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  70. Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9076–9085 (2019) Qiao et al. [2020] Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  71. Qiao, L., Tang, S., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 11899–11907 (2020) Wang et al. [2020] Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  72. Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020) Qiao et al. [2021] Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  73. Qiao, L., Chen, Y., Cheng, Z., Xu, Y., Niu, Y., Pu, S., Wu, F.: Mango: A mask attention guided one-stage scene text spotter. In: AAAI, vol. 35, pp. 2467–2476 (2021) Liu et al. [2018] Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  74. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018) Wang et al. [2020] Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  75. Wang, W., Liu, X., Ji, X., Xie, E., Liang, D., Yang, Z., Lu, T., Shen, C., Luo, P.: Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In: ECCV, pp. 457–473 (2020) Shi et al. [2017] Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  76. Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading chinese text in the wild (rctw-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017). IEEE Ma et al. [2018] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018) Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
  77. Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20(11), 3111–3122 (2018)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Maoyuan Ye (9 papers)
  2. Jing Zhang (730 papers)
  3. Shanshan Zhao (39 papers)
  4. Juhua Liu (37 papers)
  5. Tongliang Liu (251 papers)
  6. Bo Du (263 papers)
  7. Dacheng Tao (826 papers)
Citations (1)